Easy and difficult exact covering problems arising in VLSI power reduction by clock gating
نویسنده
چکیده
Several graph matching and exact covering problems arising in VLSI low-power design optimization by clock gating are presented. To maximize the power savings, clock gating requires optimal grouping of Flip-Flops (FFs), which depends on FFs’ data toggling correlations and probabilities. These naturally lead to optimal matching and exact covering problems. We present three problems arising by different clock-gating techniques. In amethod called data-driven clock-gating, the corresponding covering problem is intractable but can practically be solved by appropriate heuristics. In another method called multi-bit flipflops, the covering problem is easily solvable in a closed-form, required only sorting.We finally present the covering problem arising in a newmethod called look-ahead clock-gating, forwhich the question ofwhether the exact covering problem is easy or difficult is left open. © 2014 Elsevier B.V. All rights reserved. 1. VLSI clock-gating and covering problems The clock network together with its underlying Flip-Flops (FFs), is typically responsible for 30%–70% of the total power consumed by modern Very Large Scale Integration (VLSI) digital systems, and is thus a primary candidate for power reduction [1,2]. Clock network power is consumed due to the toggling (switching) of the clock signal (pulse). A variety of techniques exist to reduce the clock power, of which clock-gating is a predominant. It disables the clock signal when the underlying driven circuits are not subject to change (toggle) their state, and hence do not need the clock. Data-Driven Clock-Gating (DDCG) techniques have been shown to be very effective and saving up to 20% of the total chip power [3]. DDCG disables the clock pulse driving the system’s FFs [4,5] if those will not change their state (data) in the next clock cycle. It requires clustering the system’s n FFs in groups of k FFs each, sharing a common clock signal. The amount of power being saved by DDCG depends on the toggling probabilities and correlation of the FFs comprising a group. Obviously, the interest is that the joint clock signal driving a FFs group will be disabled as much as possible. In typical VLSI designs the size of nmay vary from a few thousands to a few hundreds of thousands, while kmay vary from two to a few dozens. DDCG implies a correspondingMin-Cost Exact Covering Problem (MCECP) where an n-size set should exactly be covered by n/k k-size subsets Gi, 1 ≤ i ≤ n k , with cost wi reflecting the subset’s power consumption. The cost wi in [5] was based on FFs toggling correlation (elaborated in Section 2), whereas finding the minimum power grouping has been shown in [6] to be NP-hard, and appropriate heuristics have been proposed for practical solution. The FFs grouping described in [7] was based on FFs toggling probabilities, rather than correlation. The implied MCECP has been shown to be well-solvable, ∗ Correspondence to: Engineering Faculty, Bar-Ilan University, Ramat-Gan 52900, Israel. Tel.: +972 35317208; fax: +972 37384051. E-mail addresses:[email protected], [email protected]. http://dx.doi.org/10.1016/j.disopt.2014.08.004 1572-5286/© 2014 Elsevier B.V. All rights reserved. S. Wimer / Discrete Optimization 14 (2014) 104–110 105 Fig. 1. Data-driven clock circuit. Overhead hardware is shaded in gray. having a closed-form solution which requires only sorting (elaborated in Section 3). Another clock-gating method called Look-Ahead Clock-Gating (LACG) has been proposed in [8]. There, the FFs groups are uniquely determined by the system’s logic. MCECP arises, since in LACG the grouping of FFs group is of interest (elaborated in Section 4). The clock gatingmethods discussed in this paper with their underlying matching and covering algorithms have lately been implemented in industrial environments and are currently used by companies such as Intel [8], Ceva and Mellanox [5]. 2. Covering problems implied by FFs toggling correlation DDCG disables the clock signal driving a FF when the FF’s state is not subject to change in the next clock cycle. A logic system comprising DDCG is illustrated in Fig. 1, where its hardware overhead to generate the clock disabling signal is shaded in gray. A XOR gate checks whether a FF’s state is subject to change, thus finding out whether its clock can be disabled in the next cycle. k XOR gates are ORed and latched to generate a joint gating signal for the k FFs. There is a tradeoff between the number of saved (disabled) clock pulses and the hardware overhead, and the optimal kminimizing the power consumption was derived in [4]. The problem of which FFs should be placed in a group so as to minimize the power, and how to derive those groups, was studied in [6]. Let n FFs be clocked during m + 1 cycles, and a = (a1, . . . , am) be the activity of a FF. An entry at = 0, 1 ≤ t ≤ m, if the FF stays unchanged (no toggling) from time t −1 to time t , and at = 1 otherwise. The term ∥a∥ = m t=1 at is proportional to the power consumed by the FF’s switching. All the n (n − 1) /2 pairs ai, aj , 1 ≤ i < j ≤ n, are bit-wise XORed to yield the number ai ⊕ aj of redundant clock pulses occurring if FFi and FFj are grouped and share a common gater. The smaller it is, the more desirable it is to jointly clock FFi and FFj. To model the switching power consumed when driving FFs pairs (k = 2) with a common clock gater, an n-vertex complete weighted graph G (V , E, w) is defined. Assume w.l.o.g that n is even (we could otherwise add a never toggling artificial FF and set to zero the weight of its entire incident edges). A vertex vi ∈ V is associated with FFi’s activity ai. An edge eij = vi, vj ∈ E is associated with a joint activity vector ai | aj, where the OR is a bit-wise operation. An edge eij is assigned a weightw eij = ai ⊕ aj, counting the number of redundant clock pulses incurred by clocking FFi and FFj with a common gater. Let E ′ ⊂ E, E ′ = |V | /2, be a vertexmatching of G (V , E, w). The total power consumed by the gated clock signal is proportional to the number P of pulses driving the underlying FFs, given by
منابع مشابه
Power Saving for Merging Flip Flop Using Data Driven Clock Gating
Data-driven clock gating is reducing the total power consumption of VLSI chips. There, flip-flops are merged and share a common clock signal. Finding the optimal clusters is the key for maximizing the power savings. To reduce the hardware overhead involved, flip-flops (FFs) are merged so that they share a common clock enabling signal. Power optimization system to decrease clock power by using M...
متن کاملClock gating for low power circuit design by Merge and split methods
In present VLSI technology energy dissipation is an important factor to be considered among other factors like area, speed and performance in portable devices. The size reduction and complexity of portable devices have resulted in large amount of power dissipation in the devices. As a result low power designs have become inevitable part of today’s devices. In this paper low power dissipation is...
متن کاملControlling Value Based Fine-Grained Power Gating with Sleep Signal Optimization
Power gating technology is proved to be able to effectively reduce leakage current by cutting off the idle logic blocks from their power supplies. However, the traditional power gating strategies essentially require some power management units in order to identify the idle period of target logic blocks and generate the corresponding control signals. On the other hand, to reduce both leakage pow...
متن کاملClock Gating for Dynamic Power Reduction in Synchronous Circuits
In this paper clock gating technique is presented for low power VLSI (very large scale integration) circuit design. Clock in digital circuits is used for synchronization of various components. Clock power is a major source of dynamic power consumed in synchronous circuits. Clock-gating is a well-known technique to reduce clock power. In clock gating clock to an idle block is disabled. Thus sign...
متن کاملPower reduction on clock-tree using Energy recovery and clock gating technique
Power consumption of the clock tree dominates over 40% of the total power in modern high performance VLSI designs, measures must be taken to keep it under control. Hence, low power clocking schemes are promising approaches for low-power design. We propose four novel energy recovery clocked flip-flops that enable energy recovery from the clock network, resulting in significant energy savings. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Discrete Optimization
دوره 14 شماره
صفحات -
تاریخ انتشار 2014